Search CORE

215 research outputs found

Space-efficient merging of succinct de Bruijn graphs

Author: A Bowe
B Alipanahi
D Belazzougui
FA Louza
J Holt
L Egidi
MD Muggli
MD Muggli
PA Pevzner
S Marcus
Z Iqbal
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2019
Field of study

We propose a new algorithm for merging succinct representations of de Bruijn graphs introduced in [Bowe et al. WABI 2012]. Our algorithm is based on the lightweight BWT merging approach by Holt and McMillan [Bionformatics 2014, ACM-BCB 2014]. Our algorithm has the same asymptotic cost of the state of the art tool for the same problem presented by Muggli et al. [bioRxiv 2017, Bioinformatics 2019], but it uses less than half of its working space. A novel important feature of our algorithm, not found in any of the existing tools, is that it can compute the Variable Order succinct representation of the union graph within the same asymptotic time/space bounds.Comment: Accepted to SPIRE'1

arXiv.org e-Print Archive

Crossref

Archivio della Ricerca - Università di Pisa

Archivio Istituzionale della Ricerca- Università del Piemonte Orientale

Large-scale machine learning-based phenotyping significantly improves genomic discovery for optic nerve head morphology.

Author: Alipanahi B
Behsaz B
Carroll A
Cosentino J
Dorfman EH
Foster PJ
Hammel N
Hormozdiari F
Khawaja AP
McCaw ZR
McLean CY
Peng LH
Phene S
Schorsch E
Sculley D
Publication venue
Publication date: 01/06/2021
Field of study

Genome-wide association studies (GWASs) require accurate cohort phenotyping, but expert labeling can be costly, time intensive, and variable. Here, we develop a machine learning (ML) model to predict glaucomatous optic nerve head features from color fundus photographs. We used the model to predict vertical cup-to-disc ratio (VCDR), a diagnostic parameter and cardinal endophenotype for glaucoma, in 65,680 Europeans in the UK Biobank (UKB). A GWAS of ML-based VCDR identified 299 independent genome-wide significant (GWS; p ≤ 5 × 10-8) hits in 156 loci. The ML-based GWAS replicated 62 of 65 GWS loci from a recent VCDR GWAS in the UKB for which two ophthalmologists manually labeled images for 67,040 Europeans. The ML-based GWAS also identified 93 novel loci, significantly expanding our understanding of the genetic etiologies of glaucoma and VCDR. Pathway analyses support the biological significance of the novel hits to VCDR: select loci near genes involved in neuronal and synaptic biology or harboring variants are known to cause severe Mendelian ophthalmic disease. Finally, the ML-based GWAS results significantly improve polygenic prediction of VCDR and primary open-angle glaucoma in the independent EPIC-Norfolk cohort

UCL Discovery

The Parkinson's phenome-traits associated with Parkinson's disease in a broadly phenotyped cohort

Author: 23andMe Research Team .
Alipanahi B
Cannon P
Fontanillas P
Heilbron K
Nalls MA
Noyce AJ
Publication venue
Publication date: 27/03/2019
Field of study

In order to systematically describe the Parkinson's disease phenome, we performed a series of 832 cross-sectional case-control analyses in a large database. Responses to 832 online survey-based phenotypes including diseases, medications, and environmental exposures were analyzed in 23andMe research participants. For each phenotype, survey respondents were used to construct a cohort of Parkinson's disease cases and age-matched and sex-matched controls, and an association test was performed using logistic regression. Cohorts included a median of 3899 Parkinson's disease cases and 49,808 controls, all of European ancestry. Highly correlated phenotypes were removed and the novelty of each significant association was systematically assessed (assigned to one of four categories: known, likely, unclear, or novel). Parkinson's disease diagnosis was associated with 122 phenotypes. We replicated 27 known associations and found 23 associations with a strong a priori link to a known association. We discovered 42 associations that have not previously been reported. Migraine, obsessive-compulsive disorder, and seasonal allergies were associated with Parkinson's disease and tend to occur decades before the typical age of diagnosis for Parkinson's disease. The phenotypes that currently comprise the Parkinson's disease phenome have mostly been explored in relatively small purpose-built studies. Using a single large dataset, we have successfully reproduced many of these established associations and have extended the Parkinson's disease phenome by discovering novel associations. Our work paves the way for studies of these associated phenotypes that explore shared molecular mechanisms with Parkinson's disease, infer causal relationships, and improve our ability to identify individuals at high-risk of Parkinson's disease

UCL Discovery

Optical map guided genome assembly

Author: A Gurevich
A Samad
A Valouev
AK-Y Leung
B Alipanahi
BK Stöcker
DE Jarvis
ET Dimalanta
FJ Sedlazeck
H Li
H Li
HC Lin
JM Shelton
LM Mendelowitz
MD Muggli
MD Muggli
MD Muggli
MS Waterman
N Daccord
N Nagarajan
R Walve
S Beier
S Koren
S Vij
W Pan
Y Dong
Publication venue
Publication date: 06/07/2020
Field of study

Background The long reads produced by third generation sequencing technologies have significantly boosted the results of genome assembly but still, genome-wide assemblies solely based on read data cannot be produced. Thus, for example, optical mapping data has been used to further improve genome assemblies but it has mostly been applied in a post-processing stage after contig assembly. Results We proposeOpticalKermitwhich directly integrates genome wide optical maps into contig assembly. We show how genome wide optical maps can be used to localize reads on the genome and then we adapt the Kermit method, which originally incorporated genetic linkage maps to the miniasm assembler, to use this information in contig assembly. Our experimental results show that incorporating genome wide optical maps to the contig assembly of miniasm increases NGA50 while the number of misassemblies decreases or stays the same. Furthermore, when compared to the Canu assembler,OpticalKermitproduces an assembly with almost three times higher NGA50 with a lower number of misassemblies on realA. thalianareads. Conclusions OpticalKermitsuccessfully incorporates optical mapping data directly to contig assembly of eukaryotic genomes. Our results show that this is a promising approach to improve the contiguity of genome assemblies.Peer reviewe

Crossref

Helsingin yliopiston digitaalinen arkisto

RNA splicing. The human splicing code reveals new insights into the genetic determinants of disease

Author: Alipanahi B.
Barash Y.
Blencowe B. J.
Bretschneider H.
Frey B. J.
Gueroussov S.
Hua Y.
Hughes T. R.
Jojic N.
Krainer A. R.
Lee L. J.
Merico D.
Morris Q.
Najafabadi H. S.
Scherer S. W.
Xiong H. Y.
Yuen R. K.
Publication venue: 'American Association for the Advancement of Science (AAAS)'
Publication date: 18/12/2014
Field of study

To facilitate precision medicine and whole genome annotation, we developed a machine learning technique that scores how strongly genetic variants affect RNA splicing, whose alteration contributes to many diseases. Analysis of over 650,000 intronic and exonic variants reveals widespread patterns of mutation-driven aberrant splicing. Intronic disease mutations alter splicing nine times more often than common variants, and missense exonic disease mutations that least impact protein function are five times more likely to alter splicing than others. Tens of thousands of disease-causing mutations are detected, including those involved in cancers and spinal muscular atrophy. Examination of intronic and exonic variants found using whole genome sequencing of individuals with autism reveals mis-spliced genes with neurodevelopmental phenotypes. Our approach provides evidence for causal variants and should enable new discoveries in precision medicine

Cold Spring Harbor Laboratory Institutional Repository

A peridynamic based machine learning model for one-dimensional and two-dimensional structures

Author: A Kefal
B Alipanahi
B Kilic
B Kilic
BC Simonsen
BM Lake
C Diyaroglu
C Diyaroglu
C Tesche
Cong Tien Nguyen
CR Farrar
CT Nguyen
CT Nguyen
CT Nguyen
D De Meo
DC Montgomery
DT Do
E Alpaydin
E Madenci
E Madenci
E Madenci
E Madenci
E Madenci
E Madenci
E Madenci
E Oterkus
Erkan Oterkus
F Bobaru
H Fan
H Fan
J Kalthoff
J O’Grady
J O’Grady
JA Mitchell
JF Kalthoff
JF Kalthoff
JF Unger
JN Kutz
JT Foster
M Kim
M Kružík
MJ Borden
P Underwood
R Söderberg
S Oterkus
S Oterkus
SA Silling
SA Silling
SA Silling
Selda Oterkus
SR Chowdhury
W Hu
W Liu
Y Bie
Y Gao
Y Gao
Y Hu
Y Huang
Y Jenq
Y LeCun
Z Yang
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 06/08/2020
Field of study

With the rapid growth of available data and computing resources, using data-driven models is a potential approach in many scientific disciplines and engineering. However, for complex physical phenomena that have limited data, the data-driven models are lacking robustness and fail to provide good predictions. Theory-guided data science is the recent technology that can take advantage of both physics-driven and data-driven models. This study presents a novel peridynamics based machine learning model for one and two-dimensional structures. The linear relationships between the displacement of a material point and displacements of its family members and applied forces are obtained for the machine learning model by using linear regression. The numerical procedure for coupling the peridynamic model and the machine learning model is also provided. The numerical procedure for coupling the peridynamic model and the machine learning model is also provided. The accuracy of the coupled model is verified by considering various examples of a one-dimensional bar and two-dimensional plate. To further demonstrate the capabilities of the coupled model, damage prediction for a plate with a pre-existing crack, a two-dimensional representation of a three-point bending test, and a plate subjected to dynamic load are simulated

Crossref

University of Strathclyde Institutional Repository

The effect of LRRK2 loss-of-function variants in humans

Author: 23andMe Research Team
Alföldi Jessica
Alipanahi Babak
Armean Irina M.
Banks Eric
Baptista Marco A.S.
Bergelson Louis
Cibulskis Kristian
Cole Joanne B.
Collins Ryan L.
Connolly Kristen M.
Covarrubias Miguel
Cummings Beryl
Daly Mark J.
Donnelly Stacey
Farjoun Yossi
Ferriera Steven
Francioli Laurent
Gabriel Stacey
Gauthier Laura D.
Genome Aggregation Database Consortium
Genome Aggregation Database Production Team
Gentry Jeff
Goodrich Julia K.
Guan Anna
Gupta Namrata
Jeandet Thibault
Kaplan Diane
Karczewski Konrad J.
Kleinman Aaron
Laricchia Kristen M.
Lehtimäki Terho
Llanwarne Christopher
Marshall Jamie L.
Mattila Kari M.
Merchant Kalpana M.
Minikel Eric V.
Morrison Peter
Munshi Ruchi
Neale Benjamin M.
Novod Sam
O’Donnell-Luria Anne H.
Petrillo Nikelle
Quaife Nicholas M.
Suvisaari Jaana
Wang Qingbo
Whiffin Nicola
Publication venue
Publication date: 01/01/2020
Field of study

Analysis of large genomic datasets, including gnomAD, reveals that partial LRRK2 loss of function is not strongly associated with diseases, serving as an example of how human genetics can be leveraged for target validation in drug discovery. Human genetic variants predicted to cause loss-of-function of protein-coding genes (pLoF variants) provide natural in vivo models of human gene inactivation and can be valuable indicators of gene function and the potential toxicity of therapeutic inhibitors targeting these genes(1,2). Gain-of-kinase-function variants in LRRK2 are known to significantly increase the risk of Parkinson's disease(3,4), suggesting that inhibition of LRRK2 kinase activity is a promising therapeutic strategy. While preclinical studies in model organisms have raised some on-target toxicity concerns(5-8), the biological consequences of LRRK2 inhibition have not been well characterized in humans. Here, we systematically analyze pLoF variants in LRRK2 observed across 141,456 individuals sequenced in the Genome Aggregation Database (gnomAD)(9), 49,960 exome-sequenced individuals from the UK Biobank and over 4 million participants in the 23andMe genotyped dataset. After stringent variant curation, we identify 1,455 individuals with high-confidence pLoF variants in LRRK2. Experimental validation of three variants, combined with previous work(10), confirmed reduced protein levels in 82.5% of our cohort. We show that heterozygous pLoF variants in LRRK2 reduce LRRK2 protein levels but that these are not strongly associated with any specific phenotype or disease state. Our results demonstrate the value of large-scale genomic databases and phenotyping of human loss-of-function carriers for target validation in drug discovery.Peer reviewe

Lund University Publications

Julkari

Spiral - Imperial College Digital Repository

Helsingin yliopiston digitaalinen arkisto

University of Dundee Online Publications

Trepo - Institutional Repository of Tampere University

Identification of novel risk loci for restless legs syndrome in genome-wide association studies in individuals of European ancestry : a meta-analysis

Author: Agee M
Alhenc-Gelas F
Alipanahi B
Allen RP
Auton A
Bachmann CG
Balkau B
Bell RK
Bell S
Berger K
Bonnefond A
Bonnet F
Born C
Bryc K
Butterworth AS
Caces E
Cailleau M
Cauchi S
Cogneau J
Danesh J
Dauvilliers Y
Di Angelantonio E
Dina C
Ducimetiere P
Earley CJ
Elson SL
Eschwege E
Fietze I
Fontanillas P
Franke A
Froguel P
Fumeron F
Furlotte NA
Gallois Y
Gan-Or Z
Gieger C
Girault A
Hadjigeorgiou GM
Hinds DA
Hinds DA
Hogl B
Hornyak M
Hromatka BS
Huber KE
Kemlink D
Kleinman A
Lantieri O
Litterman NK
Marre M
McIntyre MH
Metspalu A
Montplaisir J
Moreau JG
Mountain JL
Muller-Myhsok B
Northover CA
Oertel WH
Oexle K
Ondo WG
Ouwehand WH
Paulus W
Perola M
Peters A
Pitts SJ
Poewe W
Polo O
Putz B
Rakotozafy F
Ranciere F
Roberts DJ
Ross OA
Rouleau GA
Roussel R
Salminen AV
Sathirapongsasuti JF
Sazonova OV
Schormair B
Shah SH
Shelton JF
Shringarpure S
Sonka K
Soranzo N
Stefani A
Stewart AFR
Teder-Laving M
Tian C
Tichet J
Tilch E
Tittmann L
Trenkwalder C
Tung JY
Vacic V
Vodicka P
Vol S
Wilson CH
Winkelmann J
Wszolek Z
Xiong L
Zhao C
Publication venue
Publication date: 13/10/2017
Field of study

Background Restless legs syndrome is a prevalent chronic neurological disorder with potentially severe mental and physical health consequences. Clearer understanding of the underlying pathophysiology is needed to improve treatment options. We did a meta-analysis of genome-wide association studies (GWASs) to identify potential molecular targets. Methods In the discovery stage, we combined three GWAS datasets (EU-RLS GENE, INTERVAL, and 23andMe) with diagnosis data collected from 2003 to 2017, in face-to-face interviews or via questionnaires, and involving 15126 cases and 95 725 controls of European ancestry. We identified common variants by fixed-effect inverse-variance meta-analysis. Significant genome-wide signals (p Findings We identified and replicated 13 new risk loci for restless legs syndrome and confirmed the previously identified six risk loci. MEIS1 was confirmed as the strongest genetic risk factor for restless legs syndrome (odds ratio 1.92, 95% CI 1 85-1.99). Gene prioritisation, enrichment, and genetic correlation analyses showed that identified pathways were related to neurodevelopment and highlighted genes linked to axon guidance (associated with SEMA6D), synapse formation (NTNG1), and neuronal specification (HOXB cluster family and MYT1). Interpretation Identification of new candidate genes and associated pathways will inform future functional research. Advances in understanding of the molecular mechanisms that underlie restless legs syndrome could lead to new treatment options. We focused on common variants; thus, additional studies are needed to dissect the roles of rare and structural variations.Peer reviewe

University of Liverpool Repository

Julkari

GoeScholar The Publication Server of the Georg-August-Universität Göttingen

Helsingin yliopiston digitaalinen arkisto

Apollo (Cambridge)

ART: A machine learning Automated Recommendation Tool for synthetic biology

Author: A Espah Borujeni
A Esteva
AJ Jervis
AL Meadows
B Alipanahi
CE Hodgman
CG Begley
CJ Paddon
CJ Petzold
CM Denby
D Wolpert
DE Cameron
E Begoli
EC Hayden
F Pedregosa
F Prinz
G Renouard-Vallet
G Stephanopoulos
HM Salis
HR Beller
I Shaked
J Alonso-Gutierrez
J Alonso-Gutierrez
J Heinemann
J Nielsen
JA Doudna
JA Hoeting
JD Keasling
JM Granda
JV Kurian
K Kyrou
K Le
K Magnuson
L Breiman
M Baker
M HamediRad
M Kosinski
MD McKay
MM Noack
MT Bonde
NI Tracy
P Carbonell
P Opgenorth
PC Gach
PK Ajikumar
S Ma
S Unthan
S Van Dien
T Fuhrer
TS Batth
TS Gardner
V Chubukov
VG Yadav
W Duetz
WC Morrell
Y Chen
Y Yao
Z Costello
ZD Stephens
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2019
Field of study

Biology has changed radically in the last two decades, transitioning from a descriptive science into a design science. Synthetic biology allows us to bioengineer cells to synthesize novel valuable molecules such as renewable biofuels or anticancer drugs. However, traditional synthetic biology approaches involve ad-hoc engineering practices, which lead to long development times. Here, we present the Automated Recommendation Tool (ART), a tool that leverages machine learning and probabilistic modeling techniques to guide synthetic biology in a systematic fashion, without the need for a full mechanistic understanding of the biological system. Using sampling-based optimization, ART provides a set of recommended strains to be built in the next engineering cycle, alongside probabilistic predictions of their production levels. We demonstrate the capabilities of ART on simulated data sets, as well as experimental data from real metabolic engineering projects producing renewable biofuels, hoppy flavored beer without hops, and fatty acids. Finally, we discuss the limitations of this approach, and the practical consequences of the underlying assumptions failing

arXiv.org e-Print Archive

Crossref

BCAM's Institutional Repository Data

eScholarship - University of California

Genomewide Association Studies of LRRK2 Modifiers of Parkinson's Disease

Author: Aasly J
Agee M
Alcalay RN
Alipanahi B
Auton A
Beecham GW
Bell RK
Berg D
Bressman S
Brice A
Brockman K
Bryc K
Cannon P
Clark L
Cookson M
Das S
Elson SL
Fah Sathirapongsasuti J
Farrer MJ
Fiske B
Follett J
Fontanillas P
Foroud T
Furlotte NA
Gasser T
Giladi N
Goldwurm S
Gustavsson E
Hinds DA
Huber KE
Klein C
Kleinman A
Lai D
Lang AE
Langston JW
Latourelle J
Litterman NK
Lynch T
Marder K
Marras C
Martin ER
McIntyre MH
McLean CY
Mejia-Santana H
Mirelman A
Molho E
Mountain JL
Myers RH
Noblin ES
Northover CAM
Nuytemans K
Orr Urtreger A
Ozelius L
Payami H
Pitts SJ
Raymond D
Rogaeva E
Rogers MP
Ross OA
Samii A
Saunders-Pullman R
Sazonova OV
Schulte C
Schwantes-An TH
Schüle B
Scott WK
Shelton JF
Shringarpure S
Tanner C
Tian C
Tolosa E
Tomkins JE
Trinh J
Trojanowski JQ
Tung JY
Uitti R
Vacic V
Van Deerlin V
Vance JM
Vilas D
Visanji NP
Wilson CH
Wszolek ZK
Zabetian CP
Publication venue
Publication date: 01/07/2021
Field of study

Objective: The aim of this study was to search for genes/variants that modify the effect of LRRK2 mutations in terms of penetrance and age-at-onset of Parkinson's disease. // Methods: We performed the first genomewide association study of penetrance and age-at-onset of Parkinson's disease in LRRK2 mutation carriers (776 cases and 1,103 non-cases at their last evaluation). Cox proportional hazard models and linear mixed models were used to identify modifiers of penetrance and age-at-onset of LRRK2 mutations, respectively. We also investigated whether a polygenic risk score derived from a published genomewide association study of Parkinson's disease was able to explain variability in penetrance and age-at-onset in LRRK2 mutation carriers. // Results: A variant located in the intronic region of CORO1C on chromosome 12 (rs77395454; p value = 2.5E-08, beta = 1.27, SE = 0.23, risk allele: C) met genomewide significance for the penetrance model. Co-immunoprecipitation analyses of LRRK2 and CORO1C supported an interaction between these 2 proteins. A region on chromosome 3, within a previously reported linkage peak for Parkinson's disease susceptibility, showed suggestive associations in both models (penetrance top variant: p value = 1.1E-07; age-at-onset top variant: p value = 9.3E-07). A polygenic risk score derived from publicly available Parkinson's disease summary statistics was a significant predictor of penetrance, but not of age-at-onset. // Interpretation: This study suggests that variants within or near CORO1C may modify the penetrance of LRRK2 mutations. In addition, common Parkinson's disease associated variants collectively increase the penetrance of LRRK2 mutations

UCL Discovery